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A TIME-DEPENDENT POISSON RANDOM FIELD MODEL FOR 
POLYMORPHISM WITHIN AND BETWEEN TWO RELATED 
BIOLOGICAL SPECIES 

By Amei Amei 1 and Stanley Sawyer 2 

University of Nevada, Las Vegas and Washington University in St. Louis 

We derive a Poisson random field model for population site poly- 
morphisms differences within and between two species that share a 
relatively recent common ancestor. The model can be either equi- 
librium or time inhomogeneous. We first consider a random field of 
Markov chains that describes the fate of a set of individual muta- 
tions. This field is approximated by a Poisson random field from 
which we can make inferences about the amounts of mutation and 
selection that have occurred in the history of observed aligned DNA 
sequences. 

1. Introduction. A traditional goal of population genetics is to under- 
stand the relationship between Darwinian selection and evolution. A par- 
ticular problem is to estimate the distribution of the fitness of genes that 
become fixed in a natural population as a way of determining whether evo- 
lution is going "uphill" or "downhill" in that population (both are possible; 
[7, 18, 19, 29, 30]). 

One approach is to use the numbers of site polymorphisms between and 
within a pair of closely related species at a genetic locus (McDonald and 
Kreitman [31]; see also [4, 9, 12, 15, 18, 19, 24, 25, 42]). Sawyer and Hartl 
[38] developed a Poisson random field (PRF) model for these counts that can 
be used to estimate the amounts of selection that are occurring at individual 
loci ([2, 6, 20, 30, 39]; see also [48]). Multilocus extensions of the model can 
be used for large numbers of different loci [3, 5, 7, 40, 41, 47]. The models 



Received March 2009; revised October 2009. 

1 Supported by Washington University Dissertation Fellowship and NSF Grant DMS- 
01-07420. 

Supported in part by NSF Grant DMS-01-07420. 

AMS 2000 subject classifications. Primary 60J70, 92D10, 92D20; secondary 92D15, 
60F99. 

Key words and phrases. Poisson random field, DNA sequences, diffusion approxima- 
tion, population genetics. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Applied Probability, 
2010, Vol. 20, No. 5, 1663-1696. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



A. AMEI AND S. SAWYER 



assume a high level of recombination between nucleotides and are well suited 
for the analysis of polymorphism and divergence at multiple loci distributed 
across a genome [7]. Other authors have extended the basic PRF model to 
more general biological settings [8, 21, 41, 45-47] and have used numerical 
simulations to study alternative models and the effects of deviations from 
model assumptions [1, 5, 6, 49]. Williamson et al. [47] used a time-dependent 
PRF model to estimate the time since a hypothetical sudden change in 
human population size. The purpose of this paper is to provide a rigorous 
derivation of a time-inhomogeneous PRF model by approximating discrete 
time Markov chains by diffusion processes and, so that the model can be 
applied to data, to derive the corresponding sampling formulas for aligned 
DNA sequences. 

First, a brief introduction to genetics and population genetics. The ge- 
netic material or DNA of most living creatures is located in one or more 
chromosomes. A chromosome can be thought of as a long string of letters 
from the alphabet A, C, G, T, where each letter corresponds to a spe- 
cific nucleotide. Most higher plants and animals are diploid, which means 
that their DNA is arranged in pairs of chromosomes. Each of these paired 
chromosomes typically has double-stranded DNA, amounting to four DNA 
strands for each chromosome pair. The nucleotides on the two DNA strands 
of each chromosome are mirrored as a way of providing redundant informa- 
tion. Creatures with nonpaired chromosomes (such as bacteria and viruses) 
are called haploid. A gene or genetic locus is a relatively-short segment of a 
chromosome that affects a particular trait or set of traits. Random mating 
for a population of M diploid individuals, along with Mendel's laws for the 
offspring of two diploid individuals, can be statistically approximated by 
random sampling with replacement of N = 2M haploid individuals to form 
the next generation. That is, random mating of M diploid individuals is 
approximated by random mating of their N = 2M haploid genomes. 

Proteins are built from peptides, which are strings of amino acids. Peptide 
strings are created by reading a sequence of codons, which are consecutive 
triples of nucleotides, from one or more subsegments of a gene. One amino 
acid is added to the peptide for each codon of DNA. (The amino acids 
and codons have no chemical similarity to one another.) There are around 
twenty different amino acids as opposed to 4 3 = 64 possible codons on one 
DNA strand, so that the mapping of codons to amino acids cannot be one 
to one. About half of amino acids are encoded by four different codons that 
vary in the third codon position. Most of the remaining amino acids are 
encoded by two different codons. A mutation at one of the three sites in a 
codon position is called a silent mutation if the resulting new codon encodes 
the same amino acid. A site mutation that results in a different amino acid is 
called a replacement mutation. Most mutations at first and second position 
sites in codons are replacement, while mutations at the third position can be 
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either silent or replacement. The majority of replacement mutations severely 
damage the gene product and may be lethal to the host, but a substantial 
number have mild to moderate effects and can be advantageous. 

A DNA site is polymorphic in a population if there is more than one nu- 
cleotide at that site. If there is only one nucleotide at that site in a popula- 
tion, the site is monomorphic and the population is fixed at that nucleotide. 

The fitness of a gene is defined as the expected relative number of sur- 
viving offspring (genes) in the next generation that are descendants of that 
gene, assuming constant lifetimes for individuals. Suppose that we have N 
haploid individuals that are of one of two types A or a at a particular site or 
gene, where a is the mutant type and A is the pre-existing or "wild type." 
Assume that a has the fitness coefficient con (a) = 1 + ajy relative to the pre- 
existing A with ujn(A) = 1. If cr at is scaled as o~n ~ l/N for a large haploid 
population size N, then 7 is the selection coefficient of the mutant. The 
mutation is favorable if 7 > 0, detrimental if 7 < and neutral if 7 = 0. 

2. Main results and discussion. We first derive two results that lead to 
Poisson random fields (see below) for the distribution of site polymorphisms 
in a limiting infinitely large random-mating population. 

The population results cannot be directly applied to data, both because 
we cannot sample the entire population and also because the results imply 
that there are an infinitely large number of site polymorphisms in each 
gene, most of which have very small population frequencies (see below). 
However, we can use the population results to derive the distribution of 
sample statistics for aligned DNA sequences arising from two related but 
not too distantly related species. The sample statistics will turn out to be 
independent Poisson with means depending on population parameters for 
mutation and selection rates. These, in turn, lead to likelihood methods for 
the estimation of mutation, selection and divergence time parameters. 

The proofs of the population results are deferred to Sections 4-8. Mathe- 
matically, the key steps in the proofs of both of the main population results 
depend on Trotter's theorem for the dual process of a diffusion process (Sec- 
tion 6). The sampling formulas are discussed in Sections 2.2 and 2.3 with 
the details in Section 3. 

2.1. Population formulas. The first result describes the limiting distri- 
bution of the population proportions of mutant nucleotides at polymorphic 
sites at a single genetic locus in a large population. Under the assumptions 
of Section 1, this distribution is a Poisson random field (PRF) [28] on (0, 1), 
where each point is a population frequency ratio at a different site (see The- 
orem 2.1 below). The second result (Theorem 2.2) describes the expected 
number of polymorphic sites that have become fixed in the population at 



4 



A. AMEI AND S. SAWYER 



the mutant nucleotide since time 0. This random variable is also Poisson 
and is independent of the PRF derived in Theorem 2.1. 

Silent and replacement polymorphic sites are modeled separately and pro- 
vide different information (see below). 

We begin with a model for the frequency of a mutant nucleotide at a 
single site in a finite population of haploid size N. Assume that mutation is 
sufficiently rare at the site level that repeat mutations at the same site can 
be neglected. We use Moran's second model [32] for which, at each step, an 
individual is randomly chosen from the population with equal probabilities 
and replaced by a copy of a second individual (perhaps the same as the first) 
chosen with probability proportional to the fitness. Then the number X k of 
individuals in the population that have the mutant nucleotide at a particular 
site is a Markov chain on SV = {0, 1,2,..., N} with transition function 



p N (i,i + 1) 



(2.1) p N (i,i-l 



(l + a N )i/N{l-i/N) 
1 + a N i/N 

i/N(l-i/N) 



l+a N i/N 

PN(i,i) = 1 ~PN(i,i- 1) -pN(i,i+ 1) 

for < i < N with pjv(0, 0) = pn{N, N) = 1, where 1 + ajy is the fitness of 
the mutant nucleotide a with respect to A. A generation corresponds to N 
time steps. The states and N are traps corresponding to the loss of all 
mutant (a-type) or original (A-type) nucleotides, respectively. We write X£? 
instead of X k when we want to emphasize the size of the state space N. 

We assume that, at time 0, there are Mo different sites that are population 
polymorphic with both mutant and nonmutant nucleotides. There are muta- 
tions at M r new sites at times r = 1, 2, The Mq and M r are independent 

Poisson with E(M r ) = fi N and typically E(M r ) <C £7 (Mo) (r > 1). All muta- 
tions are assumed to occur at new sites. Given Mo and M r , the numbers of 
mutant nucleotides at these sites at times k > in the future (for the initial 
polymorphic sites) and at times k > r (for the new mutations) are given by 
independent Markov chains X\ a k (1 < a < Mo) and X 2 brk (1 — & — M r ), 
each of which has the transition function (2.1). By assumption, X2brr = 1 
for r > 1 and < Xi ja ^,X 2 ,b,r,k < N. 

The number of sites at which there are j mutant nucleotides at time k 
(l<j<N,k>0) is 

(2.2) N k (j) = #{a : X lj0jfc = j} + #{(6, r) : X 2 ^ k = j}, 

where # means cardinality, 1 < a < Mq, 1 < b < M r and 1 < r < k. Thus 
N / ■ \ Mo / Y \ k M r , v x 

< 2 - 3 > E/(i)^)=E/(nf)+EE/(^) 

j=l V 7 o=l V 7 r=l b=l V 7 



A TIME-DEPENDENT POISSON RANDOM FIELD 



5 



for functions f(x) on [0, 1]. The expected value of Nk(j) in (2.2) is 

jv-i k 

(2.4) E(N k (j)) = £ ojfpUhri+mJ^PN^ 1 ^ 

i=l r=l 

where ujf = E(Nq{i)) and p%(i,j) is the kth. matrix power of PN{hj)- The 
random variables Nk(J) are somewhat unintuitive in that each variable de- 
pends on information from many different polymorphic sites, rather than 
one, but turn out to have good statistical properties. 

We assume that the random variables M r and Nq(i) are independent and 
Poisson distributed. Then, by a slight extension of what is called Bartlett's 
theorem [28], this implies that, for each fixed k > 0, the counts Nk(i) (i = 
1,2,...) are independent Poisson and thus define a Poisson random field 
(PRF) on I N = {1/N,2/N, ...,1}. If the mean measures defined by (2.4) 
for y = j/N and k = k^ converge weakly as N — > oo on compact subsets of 
(0, 1], then Nk N (i) for x = i/N converge weakly in the same sense to a PRF 
on (0, 1] . The points of the limiting PRF will be a countably-infinite set of 
diffusion processes on (0, 1] each fixed at a particular time t. 

Each of the Markov chains X\ ta ^, X2.b,r.k can be approximated for large 
N by a diffusion process Xt on (0, 1) with time scaled as t ~ k/N 2 (Section 
4). The diffusion process X t has traps at the endpoints 0, 1, similarly to the 
Markov chains. Specifically, X t is the diffusion process on (0, 1) generated 
by the differential operator 

(2.5) Lx = x (i_ x )^ + 7X (i_ x )|L 5 

where 7 = limjv_ 5 . 00 Na^ for in (2.1). We write L x in the Feller form 

d d 



(2.6) L x -- 

m(dx) s{dx) 

where m{dx) = m'{x) dx and s(dx) = s'{x) dx for 

1 — 6 

(2.7) s(x) = and m(dx) = — rdx. 

7 X{1 — X) 

The function s(x) and measure m(dx) are called the scale and speed mea- 
sure of Xt, respectively. The diffusion process Xt has a smooth symmetric 
transition density p(t,x,y) =p(t,y,x) with respect to m(dx) such that 

E x (f(X t )) = Qtf(x) = E(f(X t ) \X = x) 

(2.8) 



P(t,x,y)f(y)m(dy) 
for /€C[0,1] with /(0) = /(l) = 0. 
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Fix t > and assume that <jjv, A* at, are scaled so that 

(2.9) Na N ->>y, Nfi N ->0 and k N /N 2 ->t asTV^oo. 

As mentioned earlier, each step in the Moran model (2.1) puts one individual 
at risk, so that each individual is put at risk on the average every N time 
steps. Thus Nctn — > 7 and kjy ~ tN 2 mean that selection at intensity 7 is 
applied to each individual in the diffusion time scale t. 

The relation N[j,n —> implies that the rate of arrival of new mutations 
in the diffusion time scale is (N 2 )fiN ~ N9 — > 00. However, the new mutant 
Markov chains begin at Xi i, r r = 1 which corresponds to states 1 /N — > for 
the approximation diffusion. The limiting processes Xt have a trap at x = 0, 
so that one might expect that only 0(1/N) of the new mutant Markov 
chains survive the first few generations. This would suggest that only of 
order 0((1/N)(N 2 hn)) = O(0) of the new mutants would survive, so that 
the rate N^n —¥ in (2.9) is not paradoxical. 

The equilibrium distribution of the limiting PRF of diffusion processes is 

/1N n s(l)-s(x) , s(l) - s(x) 0e> x , 

2.10 ^ ( dx ) = 0^ ym<fa =-4 V^u dx 

s(l) — s(0) x(l — x) s(l) 

([38, 48]; see also Corollary 2.1 below). In particular, the equilibrium mean 
density has a 1/x singularity at x = in a large population but is bounded 
at x = 1. This means that, in the limit, there are an infinite number of sites 
that have polymorphic mutant alleles with population proportions p, > 0. 
By properties of Poisson random fields, 

f 1 s(l)-s(x) 



Thus the vast majority of the pi are too small for the mutant nucleotide to 
show up in finite samples. However, this does show that we need to allow 
for initial limiting PRF mean measures that are not normalizable at zero. 

Specifically, for the initial distribution of polymorphic sites, we assume 
that there exists a Borel measure v(dx) on (0, 1) such that J xv(dx) < 00 
and 

N-l / • \ ■ fl 

< 2 -"> t^AN)N"' = L 

for all g G C[0, 1] for wf = E(N (j)) in (2.4). Since s'(0) = 1 in (2.7), this is 
equivalent to 
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Our first result describes the limiting PRF on (0, 1) which in turn describes 
the population distribution of polymorphic mutant sites. The proof is de- 
ferred to Section 7. 



Theorem 2.1. Assume that aN,0N,k]\r satisfies (2.9) and that Nk{i) 
defined in (2.2) satisfies (2.11) for lo^ = E(No(j)). Then for Qtf(x) in 
(2.8) 



(2.13) 




i=l 

1 Q t f(x)u(dx) + 9 f S ^~ S( $A f(x) - Q t f(x))m(dx) 



/ 8(1) -8(0) 

for any f € C[0, 1] with /(0) = /(l) = such that g(x) = f(x)/x for x > 
extends to a continuous function on [0,1]. 



The limiting PRF mean density in (2.13) is g(t,9, r y,y)m(dy) where 
(2.14) 



p(t,x,y)u(dx) + e B ^\ S ^ 
o s( 1 )-'S(0) 

1 s(l)-s(x) 



I (i\ , n Mt,x,y)m(dx). 
lo s{l)-s{0) 

The first terms on the right in (2.13) and (2.14) are transient terms that are 
due to the initial (or "legacy") polymorphisms at time t = 0. The remaining 
terms are due to new mutations that were introduced at times t > 0. 

If v(dx) in (2.11) and (2.14) is the measure fi0^(dx) in (2.10), then the 
right-hand side of (2.13) is identically L f(x)/j,$ t y(dx), so that ne a (dx) is 
an equilibrium measure. The following corollary shows that fj,g^(dx) is an 
asymptotic measure as well (Section 7.3). 

Corollary 2.1. Let G t (f) be the right-hand side of (2.13) for any 
function f(x) satisfying the conditions of Theorem 2.1. Then 

(2.15) lim Gt(f) = 6 f 1 S M - 8 W f(x)m(dx) = f f(x) M>J (dx). 

t^oo J Q s(l)-s(0) J 

Theorem 2.1 implicitly assumes that all new replacement mutants at a 
given genetic locus have the same selection coefficient 7. The model can be 
extended to allow within-locus random distributions of selective effects of 
new replacement mutants, for example, Gaussian [3, 40, 41], exponential [1] 
or gamma [5]. 
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The second main population result describes the limiting expected num- 
ber of mutant sites that have become fixed in the population at the mutant 
nucleotide by time t > (i.e., such that the nonmutant or "wild type" nu- 
cleotide has been lost). The limit can be expressed in terms of the hitting 
times T a = min{t : Xt = a} for the limiting diffusion process Xt and for its 
dual process Xf, but can also be expressed in terms of the transition density 
of Xt- The proof is deferred to Section 8. 

Theorem 2.2. Under the conditions of Theorem 2.1, the asymptotic 
expected number of mutant sites that have become fixed in the population at 
the mutant nucleotide by time t is 

lim E(N kN (N)) 

N— too 

/JV-1 k N \ 

(2.16) = lim [J2^P k N N (hN) + mJ2P k N~ r ^ N )) 

\ i=l r=l / 

t 



P x (Ti < t)v(dx) + 4tk I MTi < u) du, 



where P in (2.16) refers to the diffusion process Xt conditioned on T\ < Tq, 
which has an entrance boundary at x = (Section 6). 

The right-hand side of (2.16) can also be written (Section 8.2) 
1 



i f i ,1 

s(x)v(dx) — / / p(t,x,y)s(y)m(dy)v(dx) 



o Jo Jo 



8(1) 

(2.17) t i 

+ 6t-0 / q(u,0+,y)s(y)' j! m(dy)du 



Jo Jo 

where q(t, x, y) is the transition density of the dual-process diffusion (Section 
6). 

2.2. McDonald-Kreitman tables. Theorems 2.1 and 2.2 give the distri- 
bution of polymorphic sites and fixations of mutant nucleotides in a large 
population but are not directly applicable to samples from a finite number 
of individuals. 

Suppose that we have random samples from two species that are related 
but not extremely distantly related [31], for example, the Drosophila species 
melanogaster and simulans [7, 40, 41] or humans and chimpanzees [33]. 
Specifically, assume that we have a DNA alignment of m + n sequenced 
genes from the same genetic locus in two species, of which m are randomly 
chosen from one species and n from the second species. The sample can be 
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viewed as an (m + n) x L matrix, where L is the number of nucleotide sites 
in the alignment and the matrix elements are chosen from the four letters 
A, C, G, T, which stand for the four nucleotides in DNA. If mutation is 
sufficiently rare so that it never occurs more than once at the same site, 
then there will be at most two nucleotides at any one site in the population 
and hence at most two letters in any one column of the matrix. 

Given the joint alignment, we say that a site is a fixed difference if it is 
monomorphic in the sample within each species but polymorphic in the two 
species (i.e., monomorphic within each species, but at different nucleotides). 
The site is polymorphic if it is polymorphic within either or both of the two 
species. 

Let K s , K r be the number of silent and replacement fixed differences in the 
joint alignment, and V s ,V r the number of (within-species) polymorphisms 
at silent and replacement sites. These counts can be arranged in the 2x2 
contingency table 



where the column headings D,P refer to fixed differences and polymor- 
phisms and the row headings S, R to silent and replacement mutations, 
respectively. The table (2.18) is called a McDonald-Kreitman table [31] and 
also a DPRS table due to the row and column headings in (2.18). 

A statistically significant excess in the lower-left corner (K r ) of the table 
suggests that one or both species has seen significant positive or directional 
selection at that gene since the two species diverged. In contrast, a statisti- 
cally significant deficit of K r suggests that, instead, most new replacement 
mutations have been subject to strong negative selection. If the sites are as- 
sumed independent, one can apply traditional 2x2 contingency table tests 
to infer an excess or deficit of replacement fixed differences [12, 31]. 

We say that a polymorphic site in the joint alignment is a legacy polymor- 
phism if it is the descendent of a site that was polymorphic in the common 
ancestral species at the time of divergence. In contrast, a new polymorphism 
is a polymorphic site that was caused by a mutation since the time of diver- 
gence in one of the two daughter species. 

An important difference between the two types of polymorphisms is that 
legacy polymorphisms can lead to shared polymorphisms, which are sites 
that are polymorphic in both samples. In contrast, if mutations are suffi- 
ciently rare so that they never occur more than once at the same site, then 
new polymorphisms can cause sample polymorphisms in only one daughter 
species. Another difference is that legacy polymorphisms begin at mutant 
population frequencies strictly between and 1, while new polymorphisms 
begin at mutant population frequency (or 1/N for the Markov chain). 



(2.18) 



S 
R 



D P 

K s V s 
K r V r 



10 



A. AMEI AND S. SAWYER 



Thus legacy polymorphisms are more likely to have multiple copies of both 
the mutant and original nucleotides for relatively recent divergence times. 

An alternative to the 2x2 table in (2.18), that may have more accuracy 
in estimating parameters for recently diverged populations, is 



DOR 



(2.19) S 

R 



K s O s H s 
K r O r H r 

where O s ,O r are numbers of sites in the joint sample that are polymorphic 
in only one sample and H s ,H r the numbers of sites that are polymorphic in 
both samples. Thus V s = O s + H s and V r = O r + H r . A disadvantage of the 
DOHRS table (2.19) is that rare events in which multiple mutations occur 
at the same site may lead to a pair of new polymorphisms being misclassified 
as a legacy polymorphism. On the other hand, the more usual practice of 
counting sites that are polymorphic in both samples as two polymorphic sites 
may misclassify a single legacy polymorphism. (See [43] for other examples 
of higher-dimensional McDonald-Kreitman tables.) 

2.3. The distribution of sampling statistics. For simplicity, we assume 
that the effective population sizes N e , the mutation rates 6 and the selection 
coefficients 7 are the same in the two daughter populations. It then follows 
from Theorem 3.1 in Section 3.3 that, under the assumptions of Theorem 2.1, 
the counts K s ,V s ,K r ,V r in (2.18) and K s ,O s , H s , K r ,O r , H r in (2.19) are 
independent Poisson with means depending on f3 r = (t, 6 r ,j) for replacement 
sites and (3 S = (t,0 s ) at silent sites. If the effective population sizes for the 
two daughter species are different, the resulting formulas in Section 3 are 
easy to modify. 

Let Z a (1 < a < 4) be the counts in (for example) (2.18) in some order. 
Let m a = m a (/3) = E(Z a ) be the corresponding formulas in Section 3.3 for 
P = (f3 r ,f3 s ). Then the likelihood of (2.18) can be written 

(2.20) L(/3, Z) = J] ("exp(-m a (/3))^-m a (/3) z ^ 



or 



(2.21) logL(/3,Z) = C(Z) - X>a(/3) +£> a logm a (/3). 

a=l a=l 

While in principle the parameters (3 r , (3 S can be estimated by maximizing the 
log likelihood (2.21), normally only a few loci in a few species are sufficiently 
polymorphic to allow (5 to be estimated from data from a single locus. More 
typically, likelihoods of the form (2.20) are combined over many loci to 
form a single likelihood. The resulting expression can either be maximized 
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to estimate the model parameters [5, 25, 47] or else analyzed by Bayesian 
methods [1, 3, 7, 40, 41]. 

Although the mutation rate per site per generation is normally assumed to 
be the same for sites in the same genetic locus, which would suggest 8 r > 28 s , 
most replacement mutations are either lethal to their host or else are severely 
deleterious. This means that most replacement mutations are immediately 
lost on the diffusion time scale. This can show up in the model as a cen- 
sored replacement mutation rate with 6 r /6 s < 1 and sometimes 9 r /9 s <C 1. 
In particular, there is no simple relation between 6 S and 6 r in (2.20). 

Simulations have shown that methods based on (2.20) for multilocus data 
are relatively robust to violations of basic model assumptions, such as lack 
of local independence of polymorphic sites [1, 5, 6, 49]. 

2.4. Final comments. These results generalize sampling formulas in Sawyer 
and Hartl [38] who made the approximation that the two species populations 
are individually at equilibrium. Numerical simulations have shown that these 
lead to biased estimates of the population divergence time [1], particularly 
when the divergence time is small, but that estimates of the mutation and 
selection parameters are relatively unbiased. Nonequilibrium results can also 
be applied to situations where a population has been subject in the past to 
an abrupt change in selection parameters or population size (see, e.g., [47]). 

The sampling formulas in Section 3 for the Poisson means in (2.18) and 
(2.19) are more complex than the equilibrium results [38], but likelihoods 
based on these means can be analyzed numerically as in the time homo- 
geneous case. Terms involving diffusion transition densities (Section 3) can 
be estimated by the Crank-Nicholson method [34, 47]. As in [41], Gauss- 
Legendre quadrature can be used for integrals over a finite range. Generaliza- 
tions for random distributions of selection coefficients within loci have been 
handled by Gauss-Hermite quadrature for within-locus normal variation 
[3, 41] and Gauss-Laguerre quadrature for within-locus exponential varia- 
tion [1]. The behavior of the models described in this paper on simulated 
data, and applications to biological DNA sequence data, will be discussed 
in future publications. 

3. Sampling formulas for aligned DNA sequences. Since the two species 
in Section 2.2 are assumed to be relatively closely related on an evolutionary 
time scale, it is natural to assume that they have the same mutation and 
selection rates at each genetic locus and also the same average generation 
times. For simplicity, we also assume that the effective population sizes of 
the two daughter species are the same. This means that the scaled mutation 
rates 6 s ,9 r and selection coefficients 7 are the same at each genetic locus, 
and the scaled time t since the divergence of the two species from their most 
recent common ancestral species is also the same [see (2.9)]. As mentioned 
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earlier, the formulas below can be easily modified if these assumptions are 
violated. 

As in Section 2.2, assume that we have a DNA alignment of m + n se- 
quenced genes from the same genetic locus in two species, of which m are 
randomly chosen from one species and n from the second species. Each of 
the sample statistics K s ,V s ,K r ,V r in (2.18) and K s ,O s ,H s ,K r ,O r ,H r in 
(2.19) are counts of sites of two types, legacy and new polymorphisms, that 
lead to different formulas for the expected counts. 

3.1. Legacy polymorphisms. It follows from Section 7.1 that the distri- 
bution of polymorphic site frequencies pi in a single daughter population 
at time t > that are derived from ancestral legacy polymorphic sites is a 
Poisson random field with mean density 



where f3 = /3 S = (t,9 s ,0) for silent mutations and (3 = f3 r = (t, 9 r ,j) for re- 
placement mutations. It follows similarly from Section 8 that the number of 
population mutant fixations in a single daughter population by time t > 
that are derived from legacy polymorphic sites is Poisson with mean 



Assume that a particular legacy polymorphic site has population fre- 
quency x (0 < x < 1) for the mutant nucleotide at time t = 0. For a sample of 
size n from a daughter population at time t > 0, let I(x, n) be the probability 
that this site is monomorphic in the sample at the wild-type (nonmutant) 
nucleotide, J(x,n) that it is polymorphic in the sample and K(x,n) the 
probability that it is monomorphic at the mutant nucleotide. Then we have 
the following lemma. 

Lemma 3.1. For I(x,n), J(x,n) and K(x,n) defined above, 



(3.1) 






(3.2) 





for T c 



a 



min{a: Xt = a} as in Section 2. 



A TIME-DEPENDENT POISSON RANDOM FIELD 



13 



The first terms in the formulas for / and K are due to population fixations. 
The remaining terms are due to sampling from polymorphic sites. Note that 
I + J + K = \ in Lemma 3.1 and that /, J and K depend implicitly on 
j3= (t, #,7) through both p(t, x, y) and m(dy). 

Proof of Lemma 3.1. For a legacy polymorphism of initial mutant 
frequency x, condition on its population frequency y at time t > 0. □ 

Given a sample of size m + n from the two populations at time t > 0, 
let L\, L2, L3 be the random numbers of legacy polymorphism sites that 
are fixed differences in the sample (Li), polymorphic in only one of the 
two samples (L2) or polymorphic in both samples (£3). Then we have the 
following lemma. 

Lemma 3.2. The random variables L\,L2,L 3 defined above are indepen- 
dent Poisson with means 

E(L 1 ) = C 1 ((3) = [ (I(x,m)K(x,n) + I(x,n)K(x,m))u(dx), 
Jo 

E(L 2 ) = Ci{p) = f\j(x,m)(I(x,n)+K(x,n)) 
Jo 

(3.3) + J(x,n)(I(x, m) + K(x, m))\v (dx) 

1 

( J(x, m) + J(x, n) — 2J(x, m)J(x, n))v(dx), 

E(L 3 ) = C 3 (P)= [ J(x,m)J(x,n)u(dx). 
Jo 

Proof. Consider the random mapping x — > z = {1, 2, 3} corresponding 
to random sampling at time t > for these three outcomes. Then Lemma 
3.2 follows from Lemma 3.1 and Bartlett's theorem [28]. □ 

In particular, it follows from Lemma 3.2 that the expected number of 
polymorphic sites at time t > due to legacy polymorphisms is 

E(L2 + L 3 )= I ( J(x, m) + J(x, n) — J(x, m) J(x, n))v{dx). 
Jo 

The third term in the integrand above corrects for double counting at shared 
p olymor phisms . 

3.2. Polymorphisms from new mutations. It follows from Section 7.2 
that the contribution of new polymorphisms to site polymorphisms at time 
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t > in a single population is a Poisson random field with density 

(3.4) f N (/3,y) = (s(l) - s(y) - j\s(l) - s(x))p(t,x,y)m(dx) 

It follows similarly from Section 8 that the number of mutant fixations in a 
single daughter population due to new polymorphisms is a Poisson random 
variable with mean 

(3.5) G N (p)=0 f P (ri<«)d«. 

In particular, the limiting PRF mean density for population site polymorphic 
frequencies in a single population is y) = /l(/3, v) + In{P, y) as in (2.14), 
and the limiting expected number of mutant fixed differences is G(j3) = 
G L ((3)+G N ((3) as in (2.16). 

For a sample of size n from a single population at time t > 0, let 
(0 < k < n) be the number of new polymorphic sites that have k mutant 
nucleotides. Then we have the following lemma. 

Lemma 3.3. The random variables defined above are independent 
Poisson- distributed random variables with means 

(3.6) E(Z k ) = F N (/3, n, k) = J f N ((3, y) (f\ y\l - y) n - k m(dy) 
for 1 < k < n and /iv(/3,?/) in (3.4)- 

Proof. Consider the random mapping y — > K = {0, 1, . . . , n} defined by 
binomial sampling with parameters y and n at each polymorphic site. The 
range of the mapping is a Poisson random field by Bartlett's theorem [28]. 
Thus the random variables Z^ for 1 < k < n are independent Poisson with 
the means in (3.2). Note that Bartlett's theorem applies here even though 
Zq = oo a.s. and E(Zq) = oo. □ 

Since the random variables Z k in Lemma 3.3 are independent, the ex- 
pected number of polymorphic sites at time t > due to new polymorphisms 
is Poisson with mean 

n-l „i 

(3.7) E N (f3,n) = y2F N ((3,n,k)= f N (f3,y)(l-y n -(l-y) n )m(dy). 

The expected number of sites due to new polymorphisms that are monomor- 
phic in a sample of size n at the mutant nucleotide is (i) the expected number 
of sites that have fixed in the population at the mutant nucleotide by time 
t plus (ii) the expected number of sites that are not fixed in the population 
but are monomorphic at the mutant nucleotide in a sample, which is 

(3.8) D N (p,n) = G N ((3)+F N (P,n,n) 
for G N (P) in (3.5). 
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(3.9) 



3.3. Sampling formulas in DPRS and DOHRS tables. We have now com- 
pleted the proof of the following result. 

Theorem 3.1. Assume that the two species have the same effective pop- 
ulation size N e and the same scaled parameter values f3 r = (i, # r ,7) and 
P s = (t,0 s ,0). Then the counts K s ,O s ,H s ,K r ,O r ,H r in the table (2.19) in 
Section 2 are independent and Poisson distributed with means 

E(K S ) = d(&) + D N {j3 a ,m) + D N (p a ,n), 

E(O s )=C 2 ((3 s ) + E N (ft ,m)+E N (p a ,n), 

E(H s ) = C 3 (p s ), 

E(K r ) = C 1 (p r )+D N (p r ,m) + D N (p r ,n), 
E(O r )=C 2 ((3 r ) + E N ((3 r ,m) + E N ((3 r ,n), 
E(H r )=C 3 (P r ) 
forCiifi) in (3.3), E N (/3,n) in (3.7) and D N (f3) in (3.8). 

Corollary 3.1. The counts K s ,V s ,K r ,V r in the 2 x 2 table (2.18) 
are independent Poisson with means E(V r ) = E(H r ) + E(O r ) and E(V S ) = 
E(H s ) + E(O s ) in (3.9). 

4. Diffusion operators and diffusion approximations. This section de- 
scribes the diffusion approximation [14, 18, 44] of a single Markov chain 
X\ A ,k (k > 0) or X 2 ^,r,k {k > t > 1) in Section 2, which are assumed to have 
the same transition function PN^hj) m (2.1). A major purpose of this sec- 
tion is to show that the transition density p(t,x,y) of the limiting diffusion 
process and its first partial derivative (d/ds(x))p(t, x, y) are smooth for t > 
and < x,y < 1. (The latter will be used in Section 8.2.) 

Each of the Markov chains -X"i Q fc,-^2,6,r,fe wm be approximated by a dif- 
fusion process Xt for continuous t scaled in terms of iV 2 steps of the Markov 
chain, so that t ~ k/N 2 . The limiting diffusion process Xt is determined by 
the differential operator 

d 2 d 

(4.1) Lx = x (i_ 2; )_ + 7X (i_ x )_ 

where 7 = lim/vr^oo Na^ for ajy in (2.1). We write the operator L x in Feller 
form L x = {d/m{dx)){d/ s{dx)) where 

(4.2) six) = and midx) = — r- dx 

7 x{l — x) 

are called the scale and speed measure of L x , respectively [14, 17, 22, 23, 37]. 
At silent sites, and in general if 7 = 0, the scale and speed measure are 
s(x) = x and m(dx) = dx/(x(l — x)). 
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4.1. Green's functions and transition densities. Define 

(a ^ n(rr , A _ (s(l)-s(xVy))(s(xAy)-s(Q)) 

[ 6) 9{x,y)- a(l)-s(0) 

where x V y = max{x, y} and x A y = min{x, y}. Set 

(4.4) B 01 = {/€C[0 ) l]:/(0) = /(l) = 0} > 

where C[0, 1] is the class of continuous functions on < x < 1. Then, for 
any / € C[0, 1], h(x) = f Q g(x,y)f(y)m(dy) is the unique solution h(x) € 
C 2 (0, 1) n B i such that L x h(x) = - f{x) for < x < 1. (For s(x),m(dx) in 

(4.2) , this does no£ imply /i G C 2 [0, 1], nor even that h G C^O, 1].) 

Since s(x) is increasing, g(x,y) <mm{g(x,x),g(y,y)} by (4.3). Hence by 

(4.3) and (4.2) 

(4.5) k(x) = g(x,y) 2 m(dy) <g(x,x) g(y, y)m(dy) < oo 

io Jo 

and /J k(x)m(dx) < (f^ g(x,x)m(dx)) 2 < oo. Thus 

g(x,y) 2 m(dx)m(dy) < oo 



'0 ■/ o 

and g(x,y) is a Hilbert-Schmidt kernel on L 2 (I,m) [36]. This implies that 
there exists a complete orthonormal system of functions a n (x) in L < (I,m) 
such that 

(4.6) / g(x,y)a n (y)m(dy) = f3 n a n (x), / a n (y) 2 m(dy) = 1, 

JO JO 

where XmLi/^i < 00 an d "n(0) = a n (l) = 0. By the same arguments as 
after (4.4), f3 n ^ and the integral equation in (4.6) is equivalent to 

(4.7) L x a n (x) = -\ n a n (x), X n = l/(3 n , a n G C 2 (0, 1) n -Boi- 

Since a n (0) = a n (l) = 0, A n > by (4.1) and J2^=i ^-/Ki < °°- I n particular, 
we can assume < Ai < A n f oo. 

By Cauchy's inequality in (4.6) and by (4.5) and (4.3) 



\u n (x)\ < \ n \/k(x) < Ci\ n ^x(l -x)< Cl\ n . 
Since g(x,y) < g(x,x) and J y/y(l — y)m(dy) < oo, (4.6) implies 
(4.8) \a n (x)\ < C 2 X 2 n g(x,x) < C 3 A 2 x(l - x). 

Since g(x,y)/(x(l — x)) is bounded and continuous by (4.3), it follows from 
(4.6) and (4.8) that a n (x) / (x(l — x)) extend to continuous functions on [0, 1] . 
If 7 = 0, the functions a n (x) are polynomials related to Jacobi polynomials 
[27]. If 7 7^ 0, they are entire functions that are never polynomials. 
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By Mercer's theorem [36], 

a n (x)a n (y) 



(4.9) g{*,v) = Y, 



n=i Xn 

converges absolutely and uniformly for <x, y < 1. The series 

oo 

(4.10) p(t, x,y) = J2 e~ Xnt a n (x)a n (y) 

n=l 

converges uniformly for < x,y < 1 and t > a > since Yl^=i V^n < 00 • 
Thus 

oo 



-A„a a "M a »(j/) 
An 



p(n,x,y)d-u = ^e 

n=l 

converges absolutely and uniformly for a > 0. In particular 

/•oo 

(4.12) 9(x,v)= / p(t,x,y)dt 



with uniform convergence for < x,y < 1. 
It follows from (4.10) and (4.7) that 

p(t + s, x, y) = j p(t,x,z)p(s, z,y)m(dz) and 

(4.13) J ° 

(d/dt)p(t, x, y) = L x p(t, x,y), t > 0, < x, y < 1. 

Choose / S Bqi with f(x) = for < x < c and 1 — c < x < 1 for some c > 0. 
Let u(i,x) = Jq p(t,x,y)h(y)m(dy) for /i(x) = f Q g{x,y)f(y)m(dy). Then by 
(4.11) and (4.12) 

~ e -Ant /-l 

u(t,x) = >J— — a n (^)c n forc n =/ f(y)a n (y)m(dy), 

n=l K J ° 

where the series converges uniformly for < x < 1 and < t < oo by (4.11). 
Thus n(t,x) GC([0,oo) x [0,1]) with 

(d/dt)u(t,x) =L x u(t,x), t>0,0<x< 1, 

u(t, 0) = u(t, 1) = 0, u(0,x) = h(x). 

It follows from maximum principles for parabolic partial differential equa- 
tions [35] that 

(4.14) p(t,x,y)>0, [ p(t,x,z)m(dz) <1 



for < x,y < 1. If u(x,t) = Qth(x) = f p(t,x,y)h(y)m(dy), then 
(4.15) limQ t /(x) = /(x) 

t-5>0 
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uniformly for < x < 1 for a dense set of / € C 2 (0, 1) n Bq±, and hence for 
all / € Bq\ by standard arguments. 

4.2. Diffusion processes and semigroups. It follows from the relations 
(4.13)— (4.15) that there exists a diffusion process Xt with continuous sample 
paths with < Xt < 1 such that 

(4.16) Q t f(x) = E x (f(X t ))= f p(t,x,y)f(y)m(dy) 

Jo 

for / € C[0,1) with /(0) = /(l) = [11, 13, 16, 22, 37]. The process X t 
satisfies < X t < 1 up to the time that it is trapped at one of the endpoints 
0,1, which happens eventually with probability one. The relation 

(4.17) \Qtf(x)\<C 4 e- x ^\\f\\, ||/||= sup \f(y)\ 

0<J/<1 

from (4.10) and (4.8) gives the rate at which Xt is trapped at the endpoints. 
The exit point for Xt is given by the scale function 

, . „ /m m . s(x) — s(0) s(x) 

(4.18) ft(ri<ro) = _U_u = _u 

where T a = min{s : X s = a} [16, 22]. 

It follows from (4.8)-(4.15) that Q t :B 01 ->■ B 01 for all t > for £ i in 
(4.4) and that {Qt} is a strongly continuous semigroup of linear operators 
on B i . 

The infinitesimal generator [10, 36, 44] of a semigroup of linear operators 
Qt on a Banach space B is the linear operator A defined by Ah = f on the 
linear subspace 

V{A) = {h £ B :l[m\\(l/t)(Q t h - h) - f\\ =0 for some / G fij, 

where ||/|| is the norm in the Banach space. If Qt is strongly continuous, 
then T>(A) is dense in B and 

(4.19) ||Q t /||<Me^||/|| all/ £5 

for some real K. If K < as in (4.17), 2?(.A) is the range of the resolvent 
operator Rof = J Q Q t f dt on B with — ^4i2o = ^- This implies Ah = —f if 
h = R f [10, 36, 44]. By (4.12) 

f'OC rl 

(4.20) Rof{x)= Q t f(x)dt= g(x,y)f(y)m(dy) 

Jo Jo 

is the Green's operator defined by g{x,y). 

A core of a strongly-continuous semigroup Qt with i<C < is a subset 
C C £>(A) such that l? c = -A(C) is dense in 5. Since i?o is one-one on B, this 
is equivalent to specifying a dense subset B C C B and setting C = Rq(B c ). 
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4.3. Diffusion approximations and Trotter's theorem. Let Xj^ be the 
Markov chain defined by the Moran model (2.1). Define = X^/N, so 
that < Yf* < 1. It follows fr om standard arguments and (2.1) that 

Theorem 4.1. Let i^ be integers such that < ijy < X and such that 
X N = "In/N x for some x, < x < 1. Then for any 5 > 

lim N 2 E iN (Yf - x N ) = 7 x(l - x), 

N— s-oo 

(4.21) lim N 2 E iN ((Y? - x N ) 2 ) = 2x(l - x), 

N-*OC 

lim N 2 E iN {\Y™ - XN \ 2+5 )=0. 

7V->oo 

Since the functions on the right-hand side of (4.21) are continuous, con- 
vergence in (4.21) is equivalent to uniform convergence in x for %n = [Nx]. 
By Taylor's theorem 

lim N 2 E iN (h(Y l N )-h{x N )) 
TV— >oo 

(4.22) 

= L x h(x) = x(l — x)h"(x) + 7x(l — x)h (x) 
uniformly for < x < 1 for any h G C 2 [0, 1]. Then by Trotter's theorem [44]: 



Theorem 4.2. For Y^ N as above, ijy = [Nx], and the diffusion process 
X t in (4.16), 

(4.23) lim E iN (f(Y [NH] ))=E x (f(X t )) = Q t f(x) 

uniformly for < x < 1 for any f € C[0, 1] wii/i /(0) = /(l) = 0. T/ie con- 
vergence is also uniform for < t < T for any T > 0. 

We cannot apply (4.22) for h € C 2 [0, 1] directly for Trotter's theorem, 
since in this case there exist h € T>{A) with h £ C 1 [0, 1], let alone C 2 [0, 1]. 
However, it is sufficient to verify (4.22) for all h in a core for A [44]. If 
C = Rq{B c ) where B c is the set of all function / G Bqi such that f(x) = 
for < x < a and 1 — a < x < 1 for some a > 0, then C is such a core. 

The result (4.23) also holds for / € C[0, 1] without the conditions /(0) = 
/(l) = with an appropriate modification of the definition of Qtf(x). See 
Corollary 5.1 in Section 5 below. 

Thus, after suitable rescaling, the Markov chains {-X"i,a,fc)^2,&,r,fc} in Sec- 
tion 2 converge in distribution to diffusion processes {Xt} with infinitesimal 
generator (2.5) and scale and speed measure (2.7) in Section 2. 
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5. Exit probabilities for Markov chains. Let Xj? be the Moran-model 
Markov chain defined by (2.1). Recall that Non -^7 as N — > oo by (2.9). 
Then we have the following lemma. 

Lemma 5.1. Let i, m be integers such that 1 < i < m < N . Then 

Pi(T»<T»)= 1 l ~^; jv , ) ;i 

1 — (1 + (Tn) 

with the right-hand side replaced by i/m if on = 0. 

Proof. By (2.1), p N {i,i + l)/p N (i,i- 1) = 1 + on for < i < N. Since 
we can ignore "wait states" with X^ +1 = , (5.1) follows from the classical 
Gambler's Ruin problem (see, e.g., [23], pages 50, 92-94, and Moran [32]). 
(See Lemma 8.2 for a second proof.) □ 

As a consequence of Lemma 5.1: 

Lemma 5.2. Let in be integers such that < i/v < N and iAr/iV — > x for 
some x, < x < 1. Then 

N ^ ^TVx D ,m ^ rji \ S(x) - S(0) s(x) 



(5.2) A itoF i „W<^) = p,(r 1 <r ) s(1) _ s(0) s(l) 

forTo,Ti in (4-18) and s(x) in (4-2). If in = iN( x ) = [Nx], the convergence 
in (5.2) is uniform in x for < x < 1. 

Proof. If o n / and 7 / 0, it follows from (5.1) that 

(53) P (T N C T N ) 1 -( l + N ^/N)^ 1-e-r* 

(5.3) F lN {l N <1 )- 1 _ {1 + N(Tn/n) -n -+ x _ e - 7 

as N — > 00. The proof is similar if 7 = 0. □ 

Lemma 5.2 can be used to extend Theorem 4.2 to all / £ C[0, 1]: 

Corollary 5.1. Assume f E C[0, 1]. lei and sei = X^/iV as 
in Theorem 4-1- Set iN = [Nx]. Then 

(5.4) lim E iN {f{Y [NH] ))=E x {f{X t )) = Q t f{x) 
uniformly for < x < 1 , where 

(5.5) g t /(x) = /(0)P a ,(T o <t)+ / ' P(t,x,y)f(y)m(dy) + /(l^C?! < i). 

JO 
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Proof. Any / e C[0, 1] can be written 

/(») = 9 (x) + /(o)ifl^§ + /(D^fl^i, 

s(l)-a(0) s(l)-a(0) 

where g(0) = 5(1) = 0, so that g £ Bqi. Since (5.4) holds for g G Bqi by 
Theorem 4.2, it only remains to prove (5.4) for f{x) = s{x). However, 

(5.6) h N (i N ) = P iN {T§ < T N ) = EiMX?)) = E iN (h N (X» H] )) 

for all t > 0. Lemma 5.2 applied to both sides of (5.6) implies (5.4) for 
f(x) = s(x). □ 

A stronger result than Lemma 5.2 is the "local limit theorem." 

Lemma 5.3. Let ijy be integers with 1 < ijn < N and set xn = ijv/iV. 
Then 

(5 7) lim ?m^EKjSlll = 1 

N^oc s(xn)/s(1) 

Similarly, if <in < N — 1 and x^ = i^/N , then 
(K. q\ 1:™ 1 ~ P i N ( T N < ) _ -1 

{ ' JV->oo(s(l)-a(xjv))/s(l) 

The most important cases of Lemma 5.3 are when xn — ^ or xjv — > 1. If 
%N — iN( x ) = min{[iVx] + 1, 1}, then (5.7) holds uniformly in x for < x < 1. 
Similarly, (5.8) holds uniformly in x if i^ = iN( x ) = [Nx]. 

PROOF of Lemma 5.3. The ratio in (5.7) can be written 
l-(l + a N )- iN \ ( 1-e-f 



-ixn l\ 1 _ (1 +a N )- N 

(5.9) 

' f* N e~ yN lo s( 1+ ^) dy \ ( J 1 e-^ dy 



J XN em dy J\ e - 2 /7Viog(i+ ( x J v) dy ; ' 

where the first line of (5.9) holds for 7 7^ and the second line for all 7. 
Since Na^ — > 7, then AT log (1 + ctjv) — > 7 and the ratios converge uniformly 
in xn- Thus (5.7) follows from (5.9). Similarly, 

s(l) - a(a;) _ e"^ - e"T _ e^ 1 "^ - 1 
Px (T ° < Tl } " s(l)-s(0) " l-e-7 ~ e 7 - 1 

with a corresponding relation for h^(i). Then the ratio in (5.8) can be 
written 

' fi~ XN e ^l°g(i+^) dy\ ( ll dy 



j^- XN &n dy J\ fl eV N tag(i+°w) dy 
with a similar conclusion. □ 
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6. Dual Markov chains and dual diffusion processes. Define /ijv(i) = 
Pi(T$ < Tq ) as in (5.1) where, as before, P{ means conditional on Xq = i. 
For p N (i,j) = Pi(X^ = j) in (2.1) and 1 < i < N, define 

(6.1) q N (i,j) = Pi(X?=j | If < T Q N ) = -L-p N (iJ)h N (j). 

n N (i) 

Since Ylj=iQN(hj) = 1 for 1 < i < iV, (lN{hj) defines a Markov chain {X^} 
on Sn = {1, 2, . . . , N} that never attains X^ = and has iV as an absorbing 
boundary. Similarly 

Ek(*(X», . . . ,X?)) = Et(*(X», . . . ,Xf ) j Ti < To) 

for all functions <&(ji, . . . , jk) on (Sat)* 1 . The chain X^ [or f/7v(z,j)] can be 
called an h-process of [26]. 

6.1. The limiting dual diffusion process. We use the formula for /i/v(i) in 
Lemma 5.1 to find a diffusion approximation for {Xf}. Define = X^ /N, 

so that < Yj* < 1. The analog of Theorem 4.1 is: 

Theorem 6.1. Let i^ be integers such that 1 < i^ < iV and xn = 
In/N — > x where < x < 1- Then, for any 5 > 0, 



1 + e-T* 



lim JV^p-f - x,v | T$ < T») = b(x) = 7 x(l - x)- 
N->oo 1 — e ' 

(6.2) lim I^EiJiYf - x N ) 2 \ T$ < T N ) = 2s (1 - x), 
lim iV 2 £ ijv (|ff - x^vl 2 ^ | < 2^) = 0, 

N— >oo 

where b(x) = 2 if x = 0. If ipj = in(x) = min{[Nx] + 1,1}, i/ie convergence 
is uniform in x. 

Proof. By (2.1) and Lemma 5.1, writing j = ]n = i>N for ease of nota- 
tion, 

N 2 E,(Y l N -x N \T^<T N ) 



Nj/N(l-j/N) 
1 + a N j/N 

j/N(l-j/N)N 
1 + a N j/N 

► b(x) = 7x(l — x) 



(1 + <r N )(l - (1 + a N r^ 1 ) -(!-(! + on) 
1 - (1 + o N )-i 

o- N + o N (l + o N )~ j 



-j+i) 



1 - (1 + o N )-i 
1 + 

1 - e-i x ' 
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Similarly 

N 2 E j ((y»-x N ) 2 \Tg<I* r ) 



j/N(l-j/N) 
1 + a N j/N 

j/N(l-j/N) 

1 + a N j/N 
>2x(l-x) 



(1 + a N )(l - (1 + cTjy)-^ 1 ) + (!-(! + a N )- J+1 ) 
l-(l + a N )-i 

2 + a N -{2 + a N )(l + a N )-j ' 



and 



N'EjdYf -x N 



\2+S I \rpN , 



^iV 2 ^ /Ar ((yf - x N f | Tjv < T N ) -> 0. 



□ 



As in Section 4, Theorem 6.1 implies, by Taylor's theorem, that 

-o 



lim N 2 Edh{Yf) - h(x N ) \ T$ < 

iV— >oo 



(6.3) 

= L x 7i(x) = x(l - x)h"(x) + b{x)h'{x) 

uniformly for < x < 1 for any /i € C 2 [0, 1] for defined in (6.2). 

As suggested by qN(hj) = hN(i)~ 1 PN(i,h)hN(j) in (6.1), the operator L a 
in (6.3) satisfies 



L x h(x) 



s{x) 



L x (sh){x) 



< x < 1. 



The operator L can be written in Feller form as 

d d 



(6.4) 



L x h(x) 



m{dx) s{dx) 



h(x) 



for scale and speed measure 



(6.5) 



s(x) 



1 



s(x) 



and m(dx) = s(x) 2 m(dx) 



for s(x) and m{dx) in Section 4. 

Let At be the diffusion process in (0, 1) generated by L x . Since 

lim s(x) = — oo and / \s(x)\m(dx) < oo 

the boundary point is an entrance boundary for or X% [16, 22, 23]. 
Since is an entrance boundary, P x (Tq < oo) = for x > where T a = 
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min{i > : Xt = a} (i.e., is inaccessible for Xt), but 

limE x (f(X t )) = E (f(X t )) 

x>0 
x^O 

exists for all / 6 C[0, 1] with a probability distribution Po(Xt € dx) on (0, 1] 
for t > 0. Thus Po(-^t > 0) = 1 for any i > and, given Xq = 0, Xt leaves 
immediately [22]. 

The Green function for (6.4) and (6.5) is 

(s(l) - s(x V y))(s(a; Ay) - s(e)) 



g(x,y) = lim 



e^o s(l) — s(e) 

1 



(6.6) = s{l) -s(xVy) 



s(xVy) s(l) 
(s(l)-s(xVy)(s(xA7/)-s(0)) #0r,y) 



s(l)s(x V y)s(cc A y) s(x)s(y) 

by (4.3). Since ffg(x,y) 2 m(dx)fh(dy) = J J g(x,y) 2 m(dx)m(dy) < oo, the 
kernel g(x,y) is Hilbert-Schmidt with respect to m(dx). The eigenfunction 
equation 



i ! .1 



g(x,y)a(y)m(dy) = —— / g(x,y)s(y)a(y)m(dy) = f3a(x) 
o JO 

is the same as (4.6) for a(x) = s(x)5(x), so that we can take 5 n (a;) 
a n (x)/s{x) with the same eigenvalues X n . Define 

(6.7) Bi = {/eC[0,l]:/(l) = 0} 

and 

q(t, x ,y) = Y^ e- Xnt a n (x)a n (y) = » 



(6.8) Qtf{x)=E x {f{X t ))= q(t,x,y)f(y)m(dy), 

Jo 

~i \ f°° , \ i g(x,y) 
g{x,y)= q{t,x,y)dt= 

Jo s{x)s{y) 

For feB u by (6.8) and (4.8), 

(6.9) |g t /(*)| < C 5 e- Al *||/||, ll/H = sup |/(y)|. 

0<y<l 

Since a n (x) = a n (x)/s(x) £ B\ by the discussion in Section 4, the operators 

(6.10) Q t f(x) = -L f p(t,x,y)s(y)f(y)m(dy) = -±-Q t ( s f)(x) 

s{x) Jo s{x) 

preserve B±. The principal result of this section is presented in Section 6.2. 
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6.2. Trotter's theorem for the dual process. Let X^f and qN{hj) be for 
the dual Markov chain in (6.1) and set Y^ =Xj*/N. 

As in Section 4, the relation (6.3) defining Lh with uniform convergence 
for h G C 2 [0, 1] does not cover all h G T)(A) = Rq(Bi) since T>(A) contains 
functions h ^ C 1 [0,1]. However, if B c is the set of all / G B\ such that 
f{x) = b for < x < a and f(x) = for 1 — a < x < 1 for constants a, b with 
a > 0, then (6.3) holds for all /i in the core C = R{B C ). Then by Trotter's 
theorem [44]: 

Theorem 6.2. Let i^ be integers such that 1 < ijy < -/V and in /N — > x 
for some x, < x < 1. Then 

(6-11) lim E t Jf(Y^ 2t] ))=E x (f(X t )) = Q t f(x) 

iv— >oo 1 1 

for any f € C[0, 1] with /(l) =0. // ijv = ^v(z) = min{[iVx] + 1, 1}, the 
convergence is uniform in x, and is also uniform in t for < t < T for any 
T>0. 



A weaker version of (6.11) could be obtained from Theorem 4.2 directly: 
By (6.1) 



N-l 



£U/(^)) = E^(^')/u 

N-l 



.12) 



h N (i N ) 
s(x) 



j'=i 



N-l 



hN(lN) s(x) 



s(i/iV) \N J J \N 



where 1 < ijv < iV and iiv/iV ->■ x > 0. Now 5(2;) = s(x)f(x) G i?oi if / G JBi, 
and limN^oohjsri^/s^/N) = s(l) uniformly for 1 < i < iV by Lemma 5.3. 
Hence by Theorem 4.2 



lim E iN tf{Y{* H] )) 



1 



s(rr) 
1 



Qt(sf)(x) 
1 



s(x) 



p(t,x,y)s(y)f{y)m(dy) 



uniformly for s(x) > a for any a > 0. However, this argument does not extend 
to uniform convergence for < x < 1 nor to x = 0. 
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7. The limiting distribution of polymorphic sites. The purpose of this 
section is to prove Theorem 2.1 and Corollary 2.1 in Section 2. By (2.3), the 
expected value of the left-hand side of (2.13) is 

/N-l , . N \ /M /X N \ k N M r /X N \ 

<™> £ (E/(s)*»( i ))= £ (E/(^)+EE/(^) 

7.1. The first term in (7.1) (legacy polymorphisms) . The expected value 
of the sum in (7.1) that corresponds to legacy polymorphisms is 

/M /x N \\ N ~ l N_1 



( ° /X \\ / ' \ 

JV-l N-l , . v 

i=l 3=1 ^ ' 

/V-l 



By (6.1) 



i=i 

jV-l 

Qf/« = E^(*,i)/(^ 

3=1 

JV-l 



(T.2) __£ A(4 j )Mfl/ (i) 
where (h N f)(x) = h N ([Nx])f(x). Thus 

JV-l JV-l , 



E *W/(0 = E moo? ( if 1 ) (Owf 



i=i i=i 

(7.3) 

JV-l 



E(MW£)»^K 



'/V 



By Lemma 5.3, liniAr^oo h^i^N)/ s{in /N) = l/s(l) uniformly in x for 
iN(x) = [Nx], and we can write 

(7 . 4) lMl= g (i/ N) - g (i/ N )(i imm 



h N {i) V MO 
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for g(y) = s(l)f(y)/s(y). By the assumptions of Theorem 2.1, g(y) extends 
to a continuous function on C[0, 1] with g(l) = 0, so by Theorem 6.2 

(7.5) hm Qf NH] g{[Nx]) = Q t g{x) 
uniformly for < x < 1 and < t < T for any T > 0. By (2.12) 

(7.6) ^lim g g (£) s (£) uf = ^ *( V MyM<fo) ■ 
Thus by (7.3) and (7.5) 

hm J2co?Q»f(i)= Q t ( s (l)f/ s )(y) S -^lu(dy) 



QtfivHdy). 

This completes the proof of Theorem 2.1 for the legacy terms in (7.1). 



o 



7.2. The second term in (7.1) (new mutations). The expected value of 
the double sum in (7.1), which corresponds to new mutations, is 

(k N M r , yN \\k N , , yN \ \ 

EE/pf^)) = E W)a (/(%*)) 

k N 

(7-8) ^^C/W 

r=l 
fcjv — 1 

r=0 

where Q?/(l) = E.liWl, j)/07^)- As in (7-2) 

fcjv — 1 fejv~l 

MJV Qr 7V /(l) = M^(l) 2 QrUlh N ){l) 

r=0 r=0 

(7.9) 

JV 



(Nfi N )(Nh N (l)) / Oj^„](//M(l)^ 



where (f/h N )(y) = f(y)/h N ([Ny] + 1). 

As iV ->■ oo, Nfi N -> 6> and k N /N 2 ->■ i < oo by (2.9) and, since s'(0) = 1, 
iWijv(l) -»■ V^ 1 ) by Lemma 5.3. Thus by (7.9) and (7.4) and (7.5) 

k N -l 



(7.10) hm ^ £ Q?/(l) = / 



Q u g{Q) du, 
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where g(y) = s(l)f(y)/s(y). By (6.6) and (6.9), the Green operator 

Gf(x) = Qtf(x)dt= g(x,y)f(y)m(dy) 
Jo Jo 

is bounded in the supremum norm, and 
9 



8(1) 



t _ 
Q u g(0)du 



o 



s(l) 



Jo 



I ____ />oo ^ 

Q u g(0) du - J Q u g{0) du 



Qu{g - Q t g){o)du 
1 1 



s(l) J \s(y) 8(1) 
' 1 s(l)- S (y) 



(g(y) - Qtg(y))m(dy) 



8(1) J s(l)s(y) 



(g(y) - Qtg(y))s(y) m(dy). 



S nl S M (f(y) ~ Qtf(y))m(dy). 



Since Q t g(y) = (1/ s(y))Q t (sg) by (6.10) and g(y) = s(l)f(y)/s(y), we con- 
clude g(y) - Q t g(y) = (s(l)/s(y))(f(y) - Qtf(y)) and hence 

-7TT / Q u g(o) du = e 

si 1 ) Jo Jo a(l)-a(0) 

This is the second term on the right-hand side of (2.13) and completes the 
proof of Theorem 2.1. 

7.3. Proof of Corollary 2.1. The first term j Q l Q t f(y)v(dy) in the last 
line in (2.13) equals 

(7.11) / Qt(f/s)(y)8(y)u(dy) 

Jo 

as in (7.7) or (6.10), where f(y)/s(y) is bounded and L 1 s(y)v(dy) < oo. 
Thus the integral in (7.11) is 0(e~ Xlt ) by (6.9) and converges to zero as 
t — > oo. By the same argument applied with (s(l) — s(x))m(dx) in place of 
u(dx), 

t ^~ S ^ Qtf(^(dx) = 0(e-^) 

Jo 8(1) -8(0) 

as t — y oo as well. This completes the proof of Corollary 2.1. 
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8. The limiting numbers of fixations. 

8.1. Proof of Theorem 2.2. By Bartlett's theorem [28], the number of 
processes {X^^.} and {X 2: b,r,k} that have been trapped at state N by time 
k is Poisson with mean 

N-i fe-i 

(8.1) E(N k (N)) = J2^PN(i,N) + mJ2PN^ N )- 

i=l r=0 

The first term on the right-hand side of (8.1) corresponds to legacy poly- 
morphisms and the second to new polymorphisms. The proof of Theorem 
2.2 for both terms depends on: 

Lemma 8.1. Let i^ be integers such that 1 < ijy < N and in /N — > x for 
some x with < x < 1. Then 

(8.2) to^Pi* [^2 T N <t\T^< T A = P X (T! <t\n< T ). 

We first show how Lemma 8.1 implies Theorem 2.2. 

Proof of Theorem 2.2 given Lemma 8.1. The first sum in (8.1) 
can be written 

N-l n-i k 

.N 



■3) g^fcft*)- 



Let i = in be integers with 1 < in < N and i/N — > x, and assume k/N 2 — > t 
as in Section 2. Then 

p k N (i,N) h N (i) N N N 

:^i{ A k - iV I ^TV < 1 ) 



s(i/N) s(i/N)' 



M0 Wii#< k 



s(i/N) J \N 2 1V ~ N 2 
1 



rpN , rpN 

I N < J 



-PxiTtKtlTiKTt 



0) 



s(iy 

by Lemma 5.3 and (8.2), with uniform convergence for < x < 1 if ijv 
ijv(x) = min{[iVx] + 1, 1}. Thus by (8.3) and (2.12) 

N-i x 
lim V uf p%{i, N) = — P X (T X <t\T x < T )s(x)v(dx) 

N^oo ^ s(l) Jo 



(8.4) 



/ P as (Ti<t|Ti<r )P a .(ri<To)i/(da:) 

JO 
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•1 



P x (Ti<t)v(dx). 







The second term on the right-hand side of (8.1) is 
fc-i 

AwJ>fr(l,iV) 



r=0 



fc-1 



(8.5) 



Mh N (l) Y,Pi(Tn <r\T$ < T c 

-k/N 2 



N \ 



r=0 



{N m )(Nh N (l)) 



T N < 
N - 



N 2 



[N 2 u] 
N 2 



T$ < T N | du 



7T\ [ i%(Ti<u|Ti<r )du 



by Lemma 8.1, since N/j,n — > and N}in(1) —¥ l/s(l) as in (7.9). Combining 
(8.4) and (8.5) completes the proof of Theorem 2.2. □ 

Proof of Lemma 8.1. Assume 1 < in < N and in /N — > x for < x < 
1 as before. Then by Theorem 6.2 

lim E iN (f(Y [NH] )) = E x (f(X t )), 

N— too L ' 

if / G C[0, 1], < /(x) < 1, and f(l) = 0. Since iV is a trap, this implies 



(8.6) 
and 



£ x (ri)<liminf£ ijv 

iV— >oo 



1 



^(Ti > t) < liminf P iJV ( -^T^ > t 



iV 2 " 
1 



N 2±N 

by two applications of Fatou's theorem. If we are able to prove 



(8.7) 



E X {T X 



lim E. 

N~>oo 



in 



1 

N 2 



Tn ) < °°> 



then a standard compactness argument for weak convergence would imply 
equality in (8.6) with liminf replaced by lim, which would prove Lemma 
8.1. Hence it is sufficient to prove (8.7). 

Consider an arbitrary birth-and-death Markov chain X n on the state 
space S = {0, 1, . . . , N} with absorbing endpoints. As in the Moran model 
in (2.1), assume that the transition function can be written 

(qi--j = i + i, 

(8.8) pij = I n:j=i, 

[pi-.j = i-l, 
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where p o = Pnn = 1, Pi + n + % = 1, and pi, qi > for 1 < i < N (see, e.g., 
[23], pages 50, 92-94). Then we have the following lemma. 

Lemma 8.2. Let gij = Yl^=oPij where p^ are the matrix powers of p 
in (8.8). Then 



1^/(14 



h i = P l (T N <T ) = 

(8.9) 

, hi/xj{\ — hivj) 

9ij = A N 

for 1 < i,j < N — 1, i A j = min{i, j} and i V j = max{i, j}. Here 
j ;-i 

(8.10) Qj = n(Pfc/9*) and A i = J2 a i> (l<i<N). 

k=l j=0 

Proof. The probabilities hi = PiiT^ < Tq) satisfy the recurrence hi = 
Ei(hxx) = Pihi-i + nhi + qih i+ \ for < i < N. This implies h i+ i - hi = 
(Pi/li)(hi — hi-i) and hence hi + \ — hi = oti(h\ — ho). By summation, hi — ho = 
Ai(hi — ho). The conditions /io = and = 1 imply h\ = l/A^ and hence 
hi = Ai/A N . 

The Green matrix gij satisfies the recurrence 

N 

^PirQrj =Pi9i-l,j + n9ij + QiQi+1,3 = 9ij ~ <>ij 
j=0 

and thus g i+1)j - = (pi/qi)(gij - gi-ij) - 5%j /qj- The boundary conditions 
9oj — 9Nj = for < j < N and arguments similar to those for hi lead to 
the formula for g^. □ 

For the Moran model (2.1), Pk/qk = 1/(1 + °a0; «j = (1 + o~n)~^, and 

^ = - (i + Thus b y ( 8 - 9 ) 

(8.11) MO = Pi{T§ < T») = A = + 

Aat 1-(1+(TjvJ 

which gives a second derivation of Lemma 5.1. 

If /(») = 1 for < i < N and /(0) = f(N) = 0, then 

/-'jv 1 \ oo 



-AT : 



^(r^) = ^( £ /(x fc ))=5>((/(**)) 

5.12) 



fc=0 / fc=0 



N-l 

9N(i,j), 

3=1 
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where gN{hj) = Y1^=oPn(^3)- Then by (8.12) for the dual Markov chain in 
(6.1) and Lemma 8.2 

/ 1 \ 1 JV_1 1 1 N ~ l 

v 7 j=i v ' j=i 

- ^ 1 DM'AjKi-M^j)))^ 01 



Nh N (i)N ^ JVV JM JVV J '" a jQj 



(8.13) 



l_(l + ajV )-^v 
a N Nh N {i) 

N-l 

l 

X 



E (( X + aN jj) (M* A - h N (iVj))) 

(l + a N y 

x h N (j)- 



j/N(l-j/N) 

by (2.1) and (8.9), since <x,- = (1 + o-n)~ 3 '. By Lemma 5.3 

tv^o s(i/N)(s(l) - s(j/N)) s(l) 2 

uniformly for 1 < i, j < N — 1. Since o~jv ~ t/^V by (2.9), the terms in the sum 
in (8.13) are uniformly bounded. Then by Lemma 5.3 again, with i/N — > x 
and j/N -> y in (8.13), 



8.14 = / ^ I -dy 

7 J s[x) J s(l) 2 -y) 



1 



g{x,y)s(y)m(dy) = / g(x,y)m(dy) = E X (T X ) 



s{x) Jo 

for g(x,y) in (4.3), m{dx) in (4.2), g(x,y) in (6.8) and fh(dx) in (6.5). This 
completes the proof of (8.7) and hence of Lemma 8.1 and Theorem 2.2. □ 

8.2. Proof o/ a/ter Theorem 2.2. For fixed i > 

P*(Ti<*) = l-P a! (ri><) = l- / q(t,x,y)m(dy) 

(8-15) x 7 ° 

= l-[ P -^s(y )m (dy) 
Jo S W 
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by (6.8). Thus 

Px{Ti <t) = P X {T X < T )P :c (T 1 <t\Tx<T 
= (s(x)/s(l))P x (T 1 <t) 
1 



\ S ^~J p(*' x 'y) s (y) m (^) )• 



By (4.3), (4.8) and (4.6) 
da n (x) 



ds(x) 



< A„ / \a n (y)\m(dy)<C 3 \ d n 
f o 



and 
(8.16) 



ds(x) 



71=1 



da n (x) 
ds(x) 



Oi n {y) 



converges uniformly for t > a > and < x, y < 1. Thus by (8.15) 
Po(Ti < t) = 1 - lim ^%Ml s (y)m(dy) 



5.17) 



a-^-ojQ s(a) 
1 9 



o 5s(x; 



p(t,0+,y)s(y)m(dy) 



q(t,0,y)s(y) 2 m(dy). 
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